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1. Introduction. For a Hermitian operator A in a finite dimensional inner prod- 
uct space, the Rayleigh-Ritz method finds the stationary values, called "Ritz values," 
of the Rayleigh quotient X(x) = (x, Ax)/(x, x) on a given subspace as approximations 
to eigenvalues of A. If this "trial" subspace is A-invariant, i.e., invariant with respect 
to A, the Ritz values are exactly some of the eigenvalues of A. Given two finite di- 
mensional subspaces X and y of the same dimension, such that X is ^-invariant, 
the absolute changes in the Ritz values of A with respect to X compared to the Ritz 
values with respect to y represent the absolute eigenvalue approximation error. 

A priori error bounds for eigenvalues approximated by the Ritz values form one of 
the classical subjects in numerical linear algebra and approximation theory. Such error 
bounds are used, e.g., to estimate convergence rates of iterative methods for matrix 
eigenvalue problems. In approximation theory, the Rayleigh-Ritz method is the most 
common technique of approximating eigenvalues and eigenvectors of operators, e.g., 
[15, 19]; and a priori error bounds characterize the approximation quality. Many a 
priori bounds are known; see, e.g., [14] and references there. 

A priori error bounds for eigenvalues in [14] are based on the concept of angles 
between subspaces — one of the major ideas in multivariate statistics, closely related to 
canonical correlations. This concept also has applications in linear functional analysis 
and operator theory. The use of angles between subspaces for eigenvalue bounds is 
quite natural and may result in elegant and sharp estimates. 

Majorization is another classical area of mathematics with numerous applica- 
tions, in particular, for estimates involving eigenvalues and singular values. This 
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paper includes all the necessary material on majorization, but should not serve as an 
introduction to the subject. We follow and refer the reader to [4, 6, 7, 17], where 
background and references to original proofs can be found. 

In the pioneering results of [5], majorization is applied to bound eigenvalue errors 
a posteriori in the framework of angles between subspaces. A similar approach for a 
priori Raylcigh-Ritz error bounds is first developed in [1], e.g., a bound with a sharp 
constant is proved if X corresponds to a contiguous set of extreme eigenvalues of A. 

Our first major bound, (2.2) of Theorem 2.1, extends the result of [1] to the 
general case of an arbitrary A-invariant subspace X, which solves [1, Conjecture 3.1]. 
Moreover, our new proof, with small modifications, also covers (2.1) — the main re- 
sult of [11, 12]. Thus, our two bounds (2.1) and (2.2) of Theorem 2.1 supersede all 
main Rayleigh-Ritz error bounds of [1, 11, 12]. Our proof is based on a new general- 
ized pinching inequality for singular values and eigenvalues, Theorem 4.5, which is a 
natural extension of the standard pinching inequality, e.g., [4, Problem II. 5. 4]. 

Next, for the particular case of extreme eigenvalues in Theorem 2.2 we improve 
bound (2.2), by replacing the scalar constant in the bound with a vector of constants 
on the right-hand side. This is a delicate result — if one divides both sides of the 
improved bound, (2.5), by the vector of the constants, the statement no longer holds. 

Our second main result, Theorem 2.3, is a majorization Raylcigh-Ritz error bound 
of multiplicative type, which deals with the products of the errors, rather than the 
sums. It allows us to establish majorization bounds for the relative errors, in Theo- 
rem 2.5. Finally, we extend our bounds to the case dim A" < dim^ < oo in infinite 
dimensional Hilbert spaces, preparing for section 3. 

We apply our Raylcigh-Ritz majorization error bounds in the context of the finite 
element method (FEM), and briefly show how they improve the constant for a known 
FEM eigenvalue error bound from [14] in section 3. 

There are numerous traditional bounds in the form of vector inequalities devel- 
oped over the decades by many mathematicians. The question concerning how the 
traditional and majorization bounds compare naturally arises. It appears feasible 
that properly formulated majorization bounds would eventually outperform and thus 
replace most conventional bounds. We already have examples of such a comparison 
in the present paper, but much more work in this direction is needed and will follow. 

2. Motivation, Conjectures, and Main Results. We introduce definitions 
and after a brief motivation present theorems and conjectures on a priori majorization 
eigenvalue error bounds using principal angles between subspaces. We describe related 
results of [1, 10, 12, 14] that precede the developments of the present paper. 

2.1. Basic definitions. We give only the definition of majorization here and 
refer the reader to subsection 4.1 for an overview of some facts on majorization that 
we use. For a real vector x = \x\, ■ ■ ■ , x n ) let a;-'- be the vector obtained by rearranging 
the entries of x in an algebraically decreasing order, x{ > ■ ■ ■ > x^. We use the term 
"decreasing" for "nonincreasing" , and "increasing" for "nondecreasing" , for concise- 
ness. We say that vector y weakly (sub-)majorizes vector x and we use the notation 

x < w y if J2i=i x\. < E*=i v\i k = !> ■ • ■ > n - If in addition J27=i x i = £™=i Vi> wc 
write that vector y (strongly) majorizes vector x, which is denoted by x -< y. Wc 
overload the notation for scalars, e.g., 1 may denote the vector of ones. Nonncgative 
vectors with different numbers of entries are compared by adding zero entries. 

Let TL be a finite dimensional real or complex vector space, equipped with an inner 
product (•,■). This abstract inner-product space setting is justified by generalizing, in 
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subsection 2.8, some of our results to the case of infinite dimensional Hilbcrt spaces. 
We denote the vector of eigenvalues of a linear Hermitian operator A : H — > H by 
A(A), and keep the same notation for Hermitian matrices. We assume that the vector 
of eigenvalues A is arranged in decreasing order, i.e., A = A^. Multiple eigenvalues 
appear in A repeatedly according to their multiplicities. We define singular values of 



a linear operator B : TL — > TL as S(B) = A B*BJ , and keep the same notation 

for (rectangular) matrices, in which case the operator adjoint B* is replaced by the 
complex conjugate matrix transpose B H . 

Let A : TL — > TL be a linear Hermitian operator and Px and Py be orthoprojectors 
onto subspaces X and y with dim X < dimy. We first give a brief description of Ritz 
values. We define the Rayleigh-Ritz operator (P^A)|^ on X, where (PxA)\x denotes 
the restriction of the operator PxA to its invariant subspace X. The eigenvalues 
A ((PxA) \x) are called Ritz values of the operator A with respect to the subspace 
X. In the particular case X = spanja;} for a nonzero vector x we define the Rayleigh 
quotient \{x) = (x, Ax)/(x, x) = A ((PxA) \x)- If X is ^4-invariant, the Ritz values 
A ((PxA) \x) are some of the eigenvalues of A. For two subspaces X and y of the 
same dimension, such that X is ^4-invariant, and y approximates X, it is natural to 
expect that the Ritz values A ((PyA) \y) approximate the subset of the eigenvalues of 
A given by A ((PxA) \x)- The absolute changes in the Ritz values of A with respect 
to X compared to the Ritz values with respect to y thus represent the absolute 
eigenvalue approximation error. 

The vector of cosines squared of principal angles from the subspace X to the 
subspace y is defined by cos 2 <d(X,y) = A^ ((PxPy) \x) , where the eigenvalues of 
(PxPy)\x are rearranged in increasing order, so that the cosines arc increasing, while 
the angles are defined such that < = < ir/2. In other words, the cosine squared 
of angles from X to y are the Ritz values of the operator Py on the trial subspace 
X . If dimA" = din\V the definition becomes symmetric with respect to X and y, so it 
gives the angles between subspaces X and 3^- In the particular case X = span{x} and 
y = spanjy} for unit vectors x and y, the vector Q(X,y) has only one component 
9(x,y) € [0, 7r/2], which is the acute angle between x and y defined in the standard 
way, i.e., cos9(x,y) — \(x,y)\. If TL = C" or R ra , let orthonormal columns of matrices 
X and Y span the subspaces X and y correspondingly. Then Px — XX H and 



Py =YY H so {cos 2 Q(X,y)Y = A((PxPy)\x) = A(X H (YY H )X) = S 2 (Y H X) . 



2.2. Motivation. Let us first demonstrate that traditional inequalities may not 
be adequate for bounds on Ritz values that involve angles between subspaces. Suppose 
that we bound the vector |A ((PxA)\x) — A ((Py A)\y)\ of absolute values of matched 
distances between the decreasingly ordered Ritz values by the vector sinO^,}^) of 
the sine of principal angles between X and 3^, in order to estimate the influence of 
changes in a trial subspace on the Ritz values for the Rayleigh-Ritz method. 

Without majorization, we can compare vectors using component-wise inequalities. 
For example, let dimAf = dim} 7 = 2 and sinO(X,y) = [1,0]. Suppose that we had 
the inequality \A((P X A)\ X ) - A((P y A)\y)\ < CsinO(X,y) with some A-dependent 
constant C. Since we set the smallest angle between X and y to be zero, such an 
inequality would imply that at least one of the Ritz values for X is the same as that 
for y. This is true only in exceptional cases, e.g., if the subspaces X and y intersect 
in an eigenvector of A. Thus, changing even a single vector in the basis of the trial 
subspace in the Rayleigh-Ritz method typically results in changes in all Ritz values. 
Thus, such an inequality cannot possibly hold. 
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Typical known bounds for changes in the Ritz values are inequalities only bound- 
ing the largest change in the Ritz values by the largest angle between X and y. Is it 
possible to take advantage of the knowledge of other angles to get improved bounds 
for the change in the Ritz values? Majorization comes to the rescue as a natural tool 
for such bounds. 

2.3. Sine-based bounds for equidimensional subspaces. This is the first 
main result of the paper for trial subspaces of the same dimension. 

Theorem 2.1. Let X and y be subspaces ofH with dimA" = dim^, and let the 
operator A be Hermitian, then (see [12, Remark 4-1]) 

(2.1) \A((P X A)\ X ) - A((P y A)\ y )\ < w (X max(x+y) - Xminix+y)) sinQ(X,y), 

where \ m mtx+y) an< ^ ^max(x+y) are the smallest and largest eigenvalues of the oper- 
ator {Px+yA)\x+y, respectively. Also, if one of the subspaces is A-invariant then 

(2.2) \A((P X A)\ X ) - A((P y A)\ y )\ ^ w (X max[x+y) - Xnun {x+y) ) sin 2 G(X, y). 

If, e.g., X is A-invariant, then A ((P X A)\ X ) is a subset of eigenvalues of A count- 
ing the multiplicities, so the left-hand side of bound (2.2) represents the absolute 
eigenvalue approximation error in the Rayleigh-Ritz method. 

All main proofs for new results here are collected in section 4.3, e.g., we give a 
unified proof of both bounds of Theorem 2.1 in section 4.3.1. Our proof is shorter 
and simpler, but more sophisticated, compared to that used in [1], which covers a 
particular case of (2.2) only. There are two novel key ideas in the proof. First, we 
concatenate the absolute values in the left hand side of (2.1) (or (2.2)) with the same 
values but with the negative sign. Second, our new generalized pinching inequality 
(4.2) of Theorem 4.5 accurately bounds the concatenated vector. 

Bound (2.1) is proved in [12] and bound (2.2) is conjectured in [1, Conjecture 3.1], 
but both with a larger constant, which is the spread A max — A m ; n of the spectrum of 
A where A m ; n and A max are the smallest and largest eigenvalues of A, respectively. 
Bounds (2.1) and (2.2) use the smaller spread, X mst x(x+y)-^mm(x+y) < A max -A mi n, 
of the spectrum of the operator (P x+ yA)\ x+ y. However, [12, Remark 4.1] states 
that these two statements are in fact equivalent. The argument of [12, Remark 4.1] 
is essential and used several times below. For completeness, let us reproduce it here. 

In this paper, we always assume that both subspaces X and y are finite dimen- 
sional, cf. [13]. Let us consider the finite dimensional subspacc X + y and the operator 
(P x +yA)\ x+ y as replacements for the original space 7i and the original operator A. 
Whether we define the Rayleigh-Ritz operator, e.g., (P X A)\ X on X starting with the 
original space TL and the operator A, or with the reduced space X+y and the operator 
(P x+ yA)\ x+ y. the outcome is evidently the same: (P X A)\ X = (P x (P x+ yA)\ x+ y)\ x . 

Moreover, if X is A-invariant, it is also (P x +y A) \ x+ y -invariant, corresponding 
to the same set of eigenvalues. If this set of eigenvalues is the contiguous set of the 
largest eigenvalues of A, it is also the contiguous set of the largest eigenvalues of 
(P x+ yA)\ x+ y. Thus, without loss of generality, we can substitute the space X + y 
and the operator (Px+yA)\x+y for the space TL and the operator A. This simple 
substitution improves the constants and, most importantly, allows us to handle easily 
the case of infinite dimensional TL as we explain later in Lemma 2.6. 
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2.4. Improved sine-based bounds for equidimensional subspaces. In the 

right-hand sides of both bounds in Theorem 2.1 the scalar \ ma .x(x+y) ~ \nin(x+y) 
appears. We want to improve Theorem 2.1 by replacing the scalar factor with the 
following decreasing vector of different scalar factors: 

SP r (,*+v) = \(x+y) - ^-i(x+y\> * = !)••■) dimA" 



where \i(x+y) > " > ^dimX(x+y) an d X-i( X+ y) < ••• < \-dimX(x+y) are tne 
dimA" largest and smallest, respectively, eigenvalues of the operator (P x+ yA)\ x+ y. 
Since there are dim(A? + y) — dim,} 7 nonzero components in the vector <d(X, y), only 
the first dim(A" + y) — dim} 7 components of Spr^ + -y^, which are all nonnegative, 
will actually be used in upcoming bounds (2.3) and (2.4), where the component-wise 
products of vectors Sprr x+ y\ and sm@(X,y) appear in the right-hand sides. 

Let us consider an extreme case example with sm@(X,y) = [1,1], i.e., uncor- 
rected X and y. The largest variation in the individual Ritz values is in this case 
clearly bounded by the scalar value A max ~ Amin(;t+;y) °f the spread of the spec- 
trum of (Px+yA)\x+y, which is already used in Theorem 2.1 and which is the first 
component in the vector Sj)Tt x+ y\. Now let us consider the sum of both components 
of the vector |A ((PxA)\x) — A ( (Py A) |y)|. This sum takes the largest value if X is 
the span of two eigenvectors of A corresponding to its largest eigenvalues, while y 
is the span of two eigenvectors of A corresponding to its smallest eigenvalues. But 
this largest value in this example is exactly the sum of both components of the vec- 
tor Spr^ + -y). This example suggests that the vector spread Spr^ + -y^ might be the 
appropriate vector of constants to replace the scalar spread A maX (;e+;y) — ^raxa-tx-yy) 
in Theorem 2.1. Our numerical tests motivate the following conjecture. 

Conjecture 2.1. Let X and y be subspaces of Tt with dimX = dim}- 7 and 
operator A be Hermitian. Then 

(2.3) \A((P x A)\x)-A((PyA)\y)\ < w S W(x+y) sm®{X,y). 
If in addition one of the subspaces is A-invariant then 

(2.4) \A((P x A)\ x )-A((P y A)\y)\ -< w Spv (x+ y )S m 2 e(X,y). 

Since max |Spr (A - +y) j = \ maX ( X +y) - ^min(x+y), bounds (2.1) and (2.2) would 
follow from (2.3) and (2.4), correspondingly. 

Here, we are able to prove only (2.4) and under an additional assumption. 

Theorem 2.2. If X is A-invariant and corresponds to the contiguous set of the 
largest eigenvalues of A, then bound (2.4) holds. Consequently, we obtain 

(2.5) < A ((P x A)\x) - A ((PyA)\ y ) ^ w (A {{P X A)\ X ) - \ m - m{ x+y)) sin 2 Q(X,y), 

where we take into account that in this case Spr^ + -y^ < A{{P X A)\ X ) — Amin^^). 

Bounds (2.3), (2.4), and (2.5) are delicate. An attempt to improve them by 
dividing both sides by the vector of the constants breaks them all, as can be checked 
by running the following MATLAB code (see [10] for SUBSPACEA's description): 
A=diag([2 1 0]); X=[l 0;0 1;0 0;0 0]; Y=orth([l 0;1 2; 2 -2;0 1]); 
Spr=[2 1]'; SinTheta=flipud(sort(sin(subspacea(X,Y)))) 
Lef tHandSide=f lipud(sort (abs (sort (eig (X ' *A*X) ) -sort (eig(Y J *A*Y) ) ) ) ) 
sum(LeftHandSide)<=sum(Spr.*SinTheta.*SinTheta) 7.(2.4) and (2.5) hold 
sum(Lef tHandSide . /Spr) <=sum(SinTheta) 7 divided (2.3) breaks 
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In this example, the vector of the constants in (2.5) is equal to Spr/^ys, i.e., bounds 
(2.4) and (2.5) are the same and must hold, since the assumptions of Theorem 2.2 
are satisfied, which is confirmed by the code. The last line shows that even (2.3), the 
weakest of the three bounds, does not hold if divided by Spr^ + -y-j. 

Our bound (2.5) is competitive compared to the following known inequality, 

0<A((P X A)\ X )-A((PyA)\y) 

(2.6) < (A((P^A)U)-A min( ^ +w )max{sin 2 6(A',y)}), 

which is proved in [8] and presented above in a slightly modified formulation to make 
it consistent with (2.5). There is no majorization in (2.6) — each vector component 
is bounded separately, thus one can divide (2.6) by the vector of the constants in 
contrast to (2.5). But (2.6) only uses the largest angle, so it would be inferior to (2.5) 
if other angles are much smaller compared to the largest angle — a common situation 
in applications; e.g., see section 3 on FEM. 

2.5. Multiplicative and tangent-based bounds. The main goal of this sub- 
section is to formulate multiplicative analogs for the majorization-typc bounds of the 
previous subsection, based on products rather than sums. 

The definition of majorization for vectors is based on the sums of vector compo- 
nents. It can also deal with the products of nonnegative vector components using the 
following conventions. If for nonnegative vectors x = [xi, ■ ■ ■ , x n ] and y = [j/i, • • • , y n ] 
it holds that Yli=i x \ — Yli=i Vii k = 1,..., n, we write logx -< w logy. If in addi- 
tion nr=i x i = n™=i Vi we wr ite log a; ~< logy. For strictly positive vectors these 
conventions follow directly from the definition of (weak) majorization. 

The example of subsection 2.2 implies that it is impossible to bound products of 
changes of different Ritz values by the products of the sine of changes in the principal 
angles. Our novel multiplicative bound below uses 1 + tan 2 rather than sin 2 to bound 
the relative eigenvalue error in the form of the products. 

Theorem 2.3. Under the assumptions of Theorem 2.2 let Q(X,y) < ir/2 and 
A((P X A)\ X ) > ^mhi(x+y)- Then A((P y A)\ y ) > \ m i n (x+y) and we have 

„ A((P X A)\x) ~ Xmin(X+y) . H , , 2 n/v m\ 

< log TTj x 1 L ~<w log (1 + tan B{X, y)) , 

A \\ F yA)\y) - \nin(x+y) 

which leads to 

A((PyA)\ y ) - A m in(x+y) 
We note that either majorization result of Theorem 2.3 implies the bound 

A ((0'^)l^) - \nin(X+y) 

which is an equivalent form of the already known sine-based inequality (2.6). 

2.6. Bounds for non-equidimensional subspaces. In all statements we have 
made so far we have assumed that dimA" = dim^, but applications require the more 
general assumption dimA" < dim^V, where we interpret the principal angles Q(X , y) as 
the angles from X to y. In this paper, we briefly consider one such well known appli- 
cation of the Raylcigh-Ritz method: the finite clement method for partial differential 
equations, in section 3. 
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Since we compare dimA < dim^V Ritz values for the trial subspace X against 
dim^ Ritz values for the trial subspace y, we can either specifically choose some 
appropriate dimA Ritz values out of dim^ Ritz values for y, or simply state that 
there exist dimA Ritz values for the trial subspace y such that our bounds hold. 

Bounds (2.1) and (2.3) do not hold if dimA < dimj^ even using the latter, weaker, 
statement. Indeed, e.g., if dimA = 1 and Q(X,y) = then cither bound (2.1) or 
(2.3) would imply that A ((PxA)\x) — in this case a single number — is one of the Ritz 
values for the trial subspace y, which is not true since X is arbitrary in y. 

Known results, e.g., [9, 14], guarantee the existence of dimA Ritz values for the 
trial subspace y that are good approximations for dimA < dim^V eigenvalues for an 
arbitrary ^4-invariant subspace X if <d(X,y) is small. However, our numerical tests 
show that (2.2) and (2.4) still fail in this case; cf. [14, Lemma 2.6]. An approach of 
[14, Theorem 2.7] may help to overcome the obstacle, but it is outside of the scope of 
this paper. Here we consider only the particular case where the A-invariant subspace 
X corresponds to the contiguous set of the largest eigenvalues A ((PxA)\x) of A. 

Theorem 2.4. Let dimA < dimj^, the operator A be Hermitian, the A-invariant 
subspace X correspond to the contiguous set of the largest eigenvalues of A, and 
AdimX {(PyA)\y) denote the dimA largest eigenvalues of (PyA)\y. Then 

(2.7) < A((PxA)\x)-A dim x((PyA)\y) 

^ w (A((PxA)\x)-\ m in(x + y))sm 2 0(X,y); 

ifQ(X,y) < tt/2 andA((P x A)\x) > X min{x +y), then A dimX ((PyA)\ y ) > X min(x +y), 

mm A((P X A)\x) - \nin(X+y) , / 2rvv -v^ 

(2.8) 0<log- — ^ '- < w log(l + tan Q(X,y)) , 

AdimA" {{PyA)\y) - A m in(X+y) 

and 

(2.9) < A ((P^)|,) - W ((P y A)| y ) ^ ^ Q ^ yy 

AdimX {(PyA)\y) — A m i n (x+y) 

Proof. We use a technique presented in [8] to extend Rayleigh-Ritz error bounds 
for the particular case dimA = dim} 7 to the general case dimA < dim} 7 . 

Let dimA < dim} 7 and <d(X,y) < tt/2. We define a new subspace Z to be the 
orthogonal projection of A onto y, i.e., Z = PyX. Assuming Q(X,y) < tt/2 gives 
dimA = dimZ and Q(X,y) = 9(A,Z). Since Z C y, the Courant-Fisher min- 
max principle evidently implies that A ((PzA)\z) < A^imx ({PyA)\y) for the largest 
dimA = dimJE> eigenvalues and that X m in(x+z) > ^min(x+y)i so (2-5) leads to (2.7). 
Since bound (2.7) depends continuously on Q(X,y), the assumption Q(X,y) < tt/2 
can be removed by the continuity argument. 

Now we apply (2.6) to the pair of subspaces A and Z instead of A and y, i.e., 

< A ((P x A)\x) - A dimX ((PyA)\ y ) < A ((P x A)\x) - A ((P Z A)\ Z ) 
< (A ((P x A)\x) - X min{x+Z) ) max {sin 2 6(A,Z)}) 



< 



(A{{P x A)\x)- A min( ^ ) )max{sin 2 e(A,^)}). 



This gives the following known inequality, e.g., [8], 

(2.10) < A {{P X A)\ X ) - A dimX ((PyA)\y) 

< (A ((P x A)\x) - Xmin(x+y)) max {sin 2 S(X,y)} . 

UQ(X,y) < Tr/2&ndA((P x A)\x) > X m in(x+y) , then A dimX {(PyA)\y) > X min( x+y)- 
Theorem 2.3 immediately leads to (2.8) and (2.9) by monotonicity arguments. □ 
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2.7. Relative eigenvalue error bounds. Our previous results bound the ab- 
solute value of the eigenvalue error. They are all invariant with respect to shifting 
the operator A into A + al for any real shift a. For eigenvalues that are small in 
absolute value, it is also important to bound the relative error. Here we show how new 
relative bounds can be easily obtained from our Theorem 2.4. For relative bounds the 
shift-invariance will of course be lost, and it is natural to assume that A > 0. 

Let us first explain how relative bounds are obtained, e.g., from (2.10). Since 
A > we can bound A m i n (A'+y) > an d divide both sides of the inequality by the 
vector A {{PxA)\x) > 0, which gives 

(2.11) 0<1 A((p x A)\ x ) - max l sm Q { x >y)f- 

This is already a relative bound, but only for the largest eigenvalues, which is not 
so useful. We can turn the largest eigenvalues into the smallest ones by substituting 
A -1 for A as A > 0, but this substitution alone does not reproduce the inverse of 
the Raylcigh quotient since in general (x,Ax)(x,A~ l x) ^ 1. There is a simple fix, 
though. Introducing the notation {x,d)a = {x,Ay) for the A-based scalar product, 
we have the following trivial but crucial identity for the Raylcigh quotient, 

(x,Ax) (x,x)a ( (x,A~ 1 x)a 



(x,x) (x,A 1 x) a V (x,x)a 

It implies that the Raylcigh- Ritz method on a trial subspacc X applied to the operator 
A in the original scalar product (•, ■) or to the operator A~ x in the A-based scalar 
product (-,-)a gives the same Ritz vectors, and the Ritz values are reciprocals of 
each other. The use of the A-based scalar product changes the way we measure the 
angles, see [10]. Simultaneous substitutions A^ 1 for A and (-,-)a f° r (v) m (2-10) 
and Theorem 2.4 give the following new relative bounds. 

Theorem 2.5. Let dimA" < dim^ and Q(X ,y) < n/2, the operator A be Her- 
mitian and positive definite, A > 0, the A-invariant subspace X correspond to the 
contiguous set of the smallest eigenvalues of A, h-avaX {{Py A)\y) denote the AmiX 
(counting the multiplicities) smallest eigenvalues of (PyA)\y, and QA(X,y) denote 
the vector of angles from X to y defined in the A-based scalar product (•, -)a- Then 

(2.12) < 1 - - K{[P ^l , < max {sin 2 Q A (X,y)} , 

AdimA- {{PyA)\y) 

(2-13) < log Ad Z^ iP ^ ] \ y) -<« (1 + tau 2 Q A (X, y)) , 

(214) < w( ( y| y) _ i ^ tan2 



Let us highlight that bound (2.13) is not only relative but also multiplicative. 

We finally note that the first statement, with the sine, in Theorem 2.4 cannot be 
transformed into a relative bound in the same way. A seemingly natural extension 
1 -A((P x A)\ x )/A dimX ((PyA)\ y ) < w sin 2 Q A {X,y) of (2.12) is in fact wrong; see 
A=diag([l 2 3 100]);X=[1 0;0 1;0 0;0 0] ; Y=orth( [-6 -l;-7 1;2 6;1 -7]); 
SinThetaA=f lipud (sort (sin (subspacea (X , Y , A) ) ) ) ; 
LeftHandSide=[l l]'-[2 1] ' . /f lipud(sort (eig(Y' *A*Y) ) ) ; 
sum(Lef tHandSide) <=sum(SinThetaA . *SinThetaA) °/ f ails 
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2.8. Generalizations for Hilbert spaces. Here we extend some of the previ- 
ous results to infinite dimensional spaces, using again [11, Remark 4.1]. Let TL be an 
infinite dimensional Hilbert space and A : TL — > TL be a linear bounded Hcrmitian 
operator. Let Px and Py be orthogonal projectors onto the nontrivial finite dimen- 
sional subspaces A and y with dim A" < din\V < oo. The vector of cosines squared 
of dimA principal angles from A to y is defined by cos 2 <d(X,y) = K((PxPy)\x)- If 
X is A-invariant, the Ritz values A ((PxA) \x) are some of the eigenvalues of A, since 
we assume that X is finite dimensional. Throughout the section, we use the vectors 
of eigenvalues A enumerated in decreasing order only for finite dimensional operators, 
so the vectors have a finite number of components as before. 

Both subspaces X and y are finite dimensional. Let us consider the finite dimen- 
sional subspace X + y and the operator (Px+yA)\x+y as replacements to the original 
space TL and the operator A. The Rayleigh-Ritz operator (P;rA)|;t on X using the 
original space TL and the operator A is the same as using the reduced space X + y 
and the operator (Px+yA)\x+y, as we have already discussed. 

If X is A-invariant, it is also (Px+yA)\x+y-iiw&ri&nt, corresponding to the same 
set of eigenvalues. If this set of eigenvalues is the contiguous set of the largest eigen- 
values of A, which forms the top of the spectrum of A, then it is also the contiguous 
set of the largest eigenvalues of (Px+yA)\x+y- The latter may not be so evident in 
the infinite dimensional setting, so let us give and prove here the formal statement. 

Lemma 2.6. For a linear bounded Hermitian operator A on an infinite dimen- 
sional Hilbert space TL, let X be a nontrivial finite dimensional A-invariant subspace 
ofTL that corresponds to the top part of the spectrum of A, i.e., the smallest point of 
the spectrum of (Px A)\x is an upper bound for the largest point of the spectrum of 
(Px-lA)\x^ ■ Then for any nontrivial finite dimensional subspace yofTL the Hermi- 
tian operator (Px+yA)\x+y is invariant on X, and the spectrum of the restriction of 
(Px+yA)\x+y to X comprises the dimA largest eigenvalues of (Px+yA)\x+y ■ 

Proof. The spectrum of a bounded Hcrmitian operator is a closed bounded set 
on the real line. Since X is finite dimensional, the spectrum A((PxA)\x) consists of 
dimA eigenvalues, counting the multiplicities. Since X is A-invariant, the spectrum 
A ((PxA)\x) is a subset of the spectrum of A, which by the lemma assumption forms 
the top part of the spectrum of A. The subspace X is A-invariant by assumption and is 
evidently P^+y-invariant, so it is also (Px+yA)\x+y-'vnv&vi&iit and thus A ((PxA)\x) 
is a subset of A ((Px+yA)\x+y), counting the multiplicities, where the spectrum of 
(Px+yA)\x+y consists of dim(A + y) eigenvalues, counting the multiplicities, since 
both A and y, and thus their sum A + y, are all finite dimensional. 

The only somewhat nontrivial part of the proof is establishing that the spec- 
trum of the restriction of (Px+yA)\x+y to A comprises the dimA largest eigenvalues 
of (Px+yA)\x+y using the lemma assumption that A is an A-invariant subspace 
corresponding to the top part of the spectrum of A. In other words, adding y 
to A does not add any new eigenvalues above i = dimA. We already know that 
A ((PxA)\x), on the one hand, makes up the top dimA points of the spectrum, which 
are eigenvalues, counting the multiplicities, of A and, on the other hand, is a subset 
of A ((Px+yA)\x+y). We only need to show that the i = dimA-th eigenvalue of 
(PxA)\x, which is at the same time the i-th top point of the spectrum of A, counting 
the multiplicity of eigenvalues, bounds above the j + l-th eigenvalue of (Px +y A)\x +y ■ 
But A ((Px+yA)\x+y) is a vector of Ritz values of A on the trial subspace X + y, so 
this follows directly from the inf-sup principle for arbitrary Hermitian (not necessarily 
compact) operators, see, e.g., [6, Chapter II, Section 7] and [18, Theorem XIII. 1]. □ 
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We note that the assumptions of Lemma 2.6 are not of course applicable to all 
bounded Hcrmitian operators. E.g., Lemma 2.6 cannot be applied to an orthogonal 
projector with an infinite dimensional range. It rather covers the class of operators 
with the top part of the spectrum being discrete — a modest, but practically important, 
extension of the class of compact operators; see again [6, Chapter II, Section 7] and 
[18, Theorem XIII. 1, p. 76]. We finally note that the assumption of boundedncss 
(below) of A is not essential and can be easily replaced with the assumption that the 
subspace X+y is in the domain of the definition of the corresponding quadratic form. 

The arguments above allow us to substitute the original infinite dimensional TL 
and A with finite dimensional X + y and (Px+yA)\x+y in Theorem 2.4. 

Theorem 2.7. The infinite dimensional, AimH = oo, versions of Theorem 2.4 
and its corollary (2.10) hold under the assumptions of Lemma 2.6. 

3. Application to the FEM. In the FEM context, see, e.g., [2, 3, 14], let us 
consider a specific example, the clamped membrane vibration problem — a well known 
eigenvalue problem for the negative Laplacian — A operator in two dimensions. Let the 
membrane be a non-convex polygon Q with a single reentrant corner w € (tt, 2tt). We 
will use the standard Sobolev spaces 7? 1 (fi) of functions satisfying the homogeneous 
Dirichlet conditions on the boundary of f2 and H 1+a (fl) with a > 0. 

We set TL = ij 1 (il) and define our operator A as, informally speaking, the inverse 
to the negative Laplacian; see, e.g., [14], so that A > is compact in TL. Let us high- 
light that in this context we use the TL = H 1 (f2) scalar product in the definition of the 
angles to bound the largest eigenvalues of A, which are the reciprocals of the smallest 
eigenvalues of the negative Laplacian. We are looking for an approximation of the 
invariant space X C H (fi) of A, corresponding to the main membrane vibration 
modes, within a trial subspace y C iJ 1 (fi) by the Raylcigh-Ritz method. Using the 
simplest FEM setup, the domain Q is triangulated according to traditional assump- 
tions, and y consists of all piecewise linear (on each triangle) continuous functions 
satisfying the homogeneous Dirichlet conditions on the boundary dil. The largest 
linear size of the largest triangle is denoted by h. It holds that < X m i n ^ x +y) — * as 
h — ► 0, so we replace X m in(x+y) with its lower bound in (2.10) and Theorem 2.4. 

The angles on the right-hand sides in our eigenvalue approximation error bounds 
characterize the approximability of the target invariant subspace X by finite clement 
functions from y, which is typically measured by Ch a , where C is a generic con- 
stant, h approaches zero, and the exponent a describes the approximation order. The 
approximability is determined by the type of the FEM, smoothness of functions in 
X, and the choice of the space TL. For our example, the approximability bound for 
a function v G H 1+a {fl) with some a G (0,1] is amQ(v,y) < Ch a \\v\\ H i + o. /\\v\\ H i . 
The actual lower bound for a, which is tt/lo — e, is determined by the angle u of 
the reentrant corner of the polygon f2, which may lead to a corner singularity in 
eigenfunctions. The upper bound, 1, comes from the use of the picccwisc linear FEM. 

Let us consider a particular case, where dimA" = 2, denoting the largest eigenval- 
ues by A ((PxA)\x) = [Ai, A2] and the corresponding "K-normalized eigenfunctions by 
i>i and V2 in X . Typically, both eigenfunctions V\ and V2 would have similar corner 
singularities in the reentrant corner, so both v\ and V2 G H 1+a with a = tt/uj — e, 
but one of their linear combinations, e.g., (for illustrative purposes) v\ — V2, might 
have the full H 2 regularity, i.e., v\ — 1>2 € H 2 (H,), and so by the approximability result 
we have sin9(«i — V2,y) < Ch. Thus, smQ(X ,y) < C[h a ,h]; here and below we 
neglect terms that are a smaller order of magnitude in h compared to the terms kept. 
To clarify the example, let us assume that simply s'm<d(X ,y) = [h a ,h]. 
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This assumption may not be practical for our specific membrane problem. How- 
ever, examples are given in [3], where eigenfunctions have different regularities, while 
corresponding to the same (multiple) eigenvalue. A perturbation argument shows 
that our assumption on regularity of linear combinations of eigenfunctions is realistic. 

Using the notation A^imx ((PyA)\y) — [A'/, A 2 ] for the relevant FEM Ritz values, 
we obtain from (2.10), as in [14], that 

(3.1) < Ai — Ai < \!h 2a and < A 2 - A 2 l < X 2 h 2a , 
while (2.7) implies the bound for the error in the trace, 

(3.2) < Ai + A 2 - X'l - X'i < X x h 2a + X 2 h 2 « Ai/i 2q , 
and (2.8) gives the bound for the error in the product, 

(3.3) 0< £{l + h 2a ){l + h 2 ) -Izt h 2a . 

A x A 2 

The standard bound (3.1) implies (3.2) and (3.3) only with an extra factor 2 in 
the right-hand side. We conclude that (2.10) cannot take advantage of the better 
approximability of the function v\ — i> 2 in this example, while our new majorization 
bounds (2.7) and (2.8) can, and lead to an improvement in the constant with the 
factor dim A" = 2 for the trace and product error bounds. 

Conclusions. Majorization is a powerful tool that gives elegant and general 
error bounds for eigenvalues approximated by the Rayleigh-Ritz method. We discover 
several new results of this kind, including multiplicative bounds for relative errors. 
We apply majorization, apparently for first time, in the context of FEM error bounds. 
Our initial results are promising and expected to lead to further development of the 
majorization technique for the theory of eigenvalue computations. 

4. Appendix. Facts on majorization and angles, and most proofs are given here. 

4.1. Weak Majorization. For a real vector a = [ai, • • • , a n ] let be obtained 
by rearranging the entries of a in an algebraically decreasing order, a{ > ■ ■ ■ > a^. 
We denote \a\ = [|oi|, • • • , |o n |] and a + = max{a,0}. We say that the vector b weakly 
majorizes the vector a and we use the notation [a±, ■ • ■ , a n ] -< w [b\, ■ ■ ■ , b n ] or a -< w b 
if J2i=i a i — J2i=i fyh k = 1, . . . ,n. If in addition the sums above for k = n are 
equal, b (strongly) majorizes a, which is denoted by a -< b. Nonnegative vectors of 
different sizes may be compared by appending or removing zeros to match the sizes. 

The additive majorization statement x — y -< w z for n- vectors x = x^, y = y*, 
and z = is equivalent to 

k k 

< ^ zj + y ij , Vfc : 1 < k < n, Vij : 1 < i\ < ■ ■ ■ < iu < n 

3=1 3=1 

with x — y < z\ik ~ n gives the equality. We write log a; — logy < w log z if 
k k 

\ Xij < JJ ZjVy , Vfc : 1 < k < n, Vij : 1 < i\ < ■ ■ ■ < t fe < n, 

3 = 1 3=1 

and log x— log y -< log z if in addition the case k = n gives the equality, for nonnegative 
vectors x = x^ , y = y^ , and z = z- 1 . For strictly positive vectors this follows directly 
from the definition of (weak) majorization. 



12 



ANDREW V. KNYAZEV and MERICO E. ARGENTATI 



We need several simple general facts on weak majorization: If nonncgative vectors 
a, b, and c are decreasing and of the same size, then a -< w b implies ac -< w be, but 
the converse is not true in general. If a -< w b < c then a < w c. Concatenation holds, 
i.e., a -< c and b -< d imply [a, b] -< [c,d]; [4, Corollary II. 1.4, p. 31]. If a -< w b and 
c -< w d then a + c -< w b + d for real vectors, if the bounds b and d are ordered in the 
same way; [17, Prop. 4.A.l.b]. For a convex increasing function g(t) (e.g., g(t) — e*) 
a < w b implies g(a) ~< w g(b); [17, Prop. 4.B.2., p. 109]. Trivially, a -< w a + . 

Let 5(A) denote the vector of all singular values of the matrix A in decreasing 
order; and for A with real eigenvalues let A(A) denote the vector of all eigenvalues of 
A in decreasing order. The following theorems are mostly known; see, e.g., [4, 17]. 

Theorem 4.1 (Lidsksii). A(A) - A(B) -< A(A - B) for Hermitian A and B. 

Theorem 4.2. log 5(AB) — log 5(B) -< log 5(A) for general A and B, where we 
append zeros to the vectors of singular values if necessary to match the sizes. 

Proof. For square matrices (or operators within the same space) this is the classi- 
cal Gclfand-Naimark theorem [4, Theorem III. 4. 5]. Non-square matrices are extended 
with zero blocks to obtain square matrices. The extension with zero blocks only ap- 
pends zero singular values and does not change the ranks. □ 

Theorem 4.3. 5(AB) ~< w S(A)S(B) for general A and B, where we append 
zeros to the vectors of singular values if necessary to match the sizes. 

Proof. We add log 5(B) to both sides of the statement of Theorem 4.2 and take 
the exponential function. □ 

Our next theorem generalizes Theorem 4.2 and improves [16, Corollary 2.4]. 

Theorem 4.4. log S(ABC) - log 5(B) -< log(5(A)5(C)) for general A, B, and 
C , where we append zeros to singular values if necessary to match the sizes. 

Proof. Theorem 4.2 can also be formulated as log S(AB) — log 5(B) -< log 5(A) 
as singular values of AB and BA are the same up to zeros, so Theorem 4.2 gives both 
log S{ABC) - log 5(BC) -< log 5(A) and log 5(BC) - log 5(B) -< log 5(C). As the 
right-hand sides in these majorization statements arc ordered in the same manner, 
we can add the statements, obtaining the claim of the theorem. □ 

We also need the following generalized pinching inequality which may be new. 

Theorem 4.5. For matrices A\, A<i, B, C\, and C<x, such that all the products 
A^BCj exist for i,j = 1,2, we have, possibly up to zeros, 



(4.1) [5(Af Bd), 5(AfBC 2 )] < w 5 \JA 1 A? + A 2 A^B^C x Cf + C 2 C* 
and in the case that Ai = Ci and B = B H , we have, possibly up to zeros, 



(4.2) [A(Af BAx), A(A?BA 2 )] -< w A J AiA« + A 2 A^bJa 1 A^ 1 + A 2 A» 



Proof. We denote A = \A\ A 2 ] and C = [C\ C 2 ] and form the 2-by-2 block matrix 
D = A H BC 



AfBCi AfBC 2 
A$BCx A%BC 2 



By the standard pinching inequality, e.g., [4, Problem II. 5. 4], the combined singular 
values of the diagonal blocks of the matrix D are weakly majorized by the singular 
values of D. Using the fact that eigenvalues of matrix products do not depend on the 
order of the multipliers shows that the singular values S(D) up to zeros are the same 

as 5 (VAA H BVCC H ^} = S (\fA\A^ + A 2 A^B^C X C^ + C 2 C^, giving (4.1). 
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In the case that Aj = Cj and B = B H . the eigenvalues of the diagonal blocks of 
D (which are now square) are strongly majorized by the eigenvalues of D. For the 
latter, we have, up to zeros, A(D) = A(A H BA) = A(AA H B) = A(VAA H BVAA H ). 
Appending or removing zeros preserve majorization for nonnegative vectors, so we 
replace A(D) ^ w (A(D)) + in the formulas above, which proves (4.2). □ 

If sizc^^b] = sizcB, there are no zeros appearing, so (4.2) holds as a strong 
majorization and without the + operation, as in the standard pinching inequality. 

4.2. Principal Angles Between Subspaces. We need the following: 
Theorem 4.6. [10, Theorem 3.4] Let dimA = dim^. Then we have the equalities 

A (P x Py±P X ) = S 2 (PyP x .) = S 2 (P x ±Py) = [sin 2 9(A, y), 0, . . . , 0] . 

Theorem 4.7. [13, Theorem 2.16] If dimA = dim} 7 = p, then the first, i.e., 
largest, p components of the vector A(P X — Py) are given by the vector sin 0( A, 3-0- 

4.3. Proofs. In this section we provide the main and relatively long proofs. 

4.3.1. Proof of Theorem 2.1. We start with two important simplifications. 1 
First, by [12, Remark 4.1] we use the subspace X + y and the operator (P x+ yA)\ x+ y 
as substitutions for the original space H and the original operator A, keeping the 
same notation, without loss of generality. Second, the differences of Ritz values do 
not change with a shift of A, i.e., for a real a and A s = A — al, we have, e.g., 
A((P X A) \ x )-A((PyA) \y) =A((P X A S ) \ x )-A((P y A s ) \y). So all our statements 
are invariant with respect to a real shift of A, which we can freely choose. Also, our 
bounds are invariant with respect to a real scaling of A. Thus for any real a and (3 ^ 
we can replace A with (3{A ~ al) without loss of generality. We take a = A m i n and 
(3 = l/(A max — Amin) so m the rest of the proof we assume that A is already shifted 
and scaled such that A m j n = and A max = 1, which guarantees well defined square 
roots VA and y/I — A. 

We now prove that 

(4.3) \A((P X A)\ X ) - A((P y A)\y)\ ^ w sme(X,y), 
and if in addition X is yl-invariant then 

(4.4) \A((P X A)\ X ) - A((P y A)\ y )\ ^ w sm 2 e(X,y). 
Concatenating positive and negative values together, we obtain 

[\A((P X A)\ X ) - A((PyA)\y)\ , - \A((P X A)\ X ) A((PyA)\y)\} 1 

= [A((P X A)\ X ) A((PyA)\y), (A((P X A)\ X ) ~ A((PyA)\y))} 1 
= [A((P X A)\ X ) A((PyA)\y), A((P X (I - A))\ X ) A((Py(I - A))\y)} 1 . 

It is more convenient for us to work in the whole space, so above we replace 

[A((P X A)\ X ) - A((PyA)\y), A((P X (I - A))\ X ) - A((Py(I A))\y),0, ...,0} 1 

= [A(P X AP X ) - A(PyAPy), A(P X (I - A)P X ) - A(Py(I - A)Py)] 1 , 

using [A((P^-A)|^), 0, . . . , 0] = A(P X AP X ) and similar formulas involving y instead 
of X and I — A instead of A, which all hold since A > and / — A > in this proof, 
so the added zeros are correctly placed. 



1 Ilya Lashuk's proof (private communication, unpublished) of a particular case, where A is an 
orthogonal projector, of bound (2.2) has stimulated our present proof. 
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At this point the proof splits for (4.3) and (4.4). To prove (4.3), in the first sub- 
vector A(P x AP x ) — A(PyAPy) we swap multipliers without changing the eigenvalues, 

e.g., A(P X \/A\fAP x ) = A ( \[APx\[A ) , and use Theorem 4.1, 



A(PxAPx) - A(PyAPy) = A (VAPxVa) - A WAPyVA 

~< A (yj(P x - Py)^fA 

and similarly for the second sub- vector, 



A(P X (I - A)Px) - A(P y (I - A)P y ) ~< A [^/T~A(Px - Py)VT~A 

We concatenate, as a -< c and b -< d imply [a, b\ -< [c, d], and we obtain 
[A(PxAPx) - A(PyAPy), A(P X (I - A)P X ) - A(Py(I - A)Py)} 



-N 



A (yA{P x - Py)VA) , A (Vl~A{Px - Py)VT~A 
(A(Px - Py)) + , 0, . . . , oj . (by Theorem 4.5) 



Picking up the dimA" = dim^ largest (nonnegative) elements on both sides of this 
strong majorization statement proves the weak majorization claim (4.3), since by The- 
orem 4.7 the dimA" = dim} 7 largest elements of A{Px — Py) are equal to sin Q(X, y). 
To prove (4.4) having A = P X AP X + P x j-APx±, wc notice that 

(4.5) A(PyAPy) - A(PyPxAPxPy + PyP x ^AP x ^Py) > A(Py P X AP X Py) , 

as PyP x ±AP x ±Py > 0, and similarly A(P y (I - A)P y ) > A(PyP x (I - A)P x Py), so 

[A(PxAPx) - A(PyAPy), A(P X (I - A)Px) - A(Py(I - A)Py)] 

< [A(PxAPx) - A(PyPxAPxPy), A(P X (I - A)P X ) - A{P y P x {I - A)P x Py)} 
-< [A (y/AP x Py±PxVA) , A (yi~APxPy±PxVT~Aj 
-< w [A(P x P y ±P x ),0, . . . , 0] (by Theorem 4.5, since A(P x Py±P X ) > 0) 

= [sin 2 Q(X, y), 0, . . . , 0] . (by Theorem 4.6) 

Here, in the second line we again use that the eigenvalues of the matrix product do 
not depend on the order of the matrix multipliers, so wc transform, e.g., in the first 

vector, A{P X AP X ) = A (x/AP^Va) and A{P y PxAP x Py) = A (yAPxPyPx^A} . 

In the next line we independently apply Theorem 4.1 to each of the two sub- vectors. 

4.3.2. Proof of Theorem 2.2. As in the previous proof, we start with two 
simplifications. The first one is the same: by [12, Remark 4.1] we use the subspace 
X + y and the operator (Px+yA)\x+y as substitutions for the original space 7i and 
the original operator A keeping the same notation, without loss of generality. Second, 
we choose a = min{A((P^A) 1^)} and assume that the shift is already applied to 
A, i.e., without loss of generality we assume that both PxAPx and Px±(— A)P X ± 
are nonnegative definite and so they have well-defined square roots \/PxAPx and 
\J P x ± (— A)P X ± , correspondingly. 
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For an A-invariant subspace X, we split A = PxAPx + Px±AP x ±, and adding 
and subtracting A((PyP x AP x )\y) we derive 

Q<A((PxA)\x)-A((PyA)\ y ) 
= A((P x A)\x) - A((PyP x AP x )\y + (PyPx^AP x ^)\y) 
= A((PxA)\x) - A((PyPxAP x )\y) 

+A((PyP x APx)\y) - A((P y P x APx)\y + (PyPx^APx±)\y). 

Now, we bound separately the two terms in the sum in the last two lines. We 
remind the reader that a -< b and c -< d imply a + c -< m + d+ for real vectors, and 
this holds similarly for weak majorization. 

It is convenient to extend the operators' restrictions by zero to the whole space and 
use a convention that operations and comparisons of nonnegative decreasing vectors 
with different numbers of components is done by appending zeros at the end of the 
vectors to match the vectors' sizes, e.g., A{P X AP X ) = [A ((PxA) \ x ) , 0, . . . , 0] > 0. 
Since dimA" = dim^ the number of zeros to add for X and y is the same. However, 
a seemingly trivial claim A(PyAPy) = [A((PyA)\y), 0, ... ,0] is, in fact, wrong, since 
the components of A((PyA)\y) may not be all nonnegative, so the added zeros are 
misplaced compared to A(PyAPy), which is decreasing by definition. 

We start with the first term in the sum on the right-hand side. Since both 
PxAPx > and PyPxAPxPy > 0, we concatenate with zeros correctly and obtain 

[A((PxA)\x)-A((P y P x APx)\y),0,---A = A{P X AP X ) - A{P y P x AP x P y ) 

< A ^PxAPxPxPy^Px^fPxAPx) 
< w S(PxAPx) sin 2 Q(X,y), 

applying Theorems 4.1, 4.3, and 4.6. 

Considering the second term and again using Theorem 4.1 wc get 

A((PyP x APx)\y) - A((PyP x APx)\y + (PyPx^AP x ±)\y) 

<A{{P y Px^-A)Px^)\y). 

By our assumption on the shift, we have Px±(—A)Px± > so 

< [A{{PyP x ±(-A)P x x)\y),0,-",0] 

= A{Py{PxA-A)Px^)Py) 

= A(y/P x ±(-A)P x ±P x ±PyP x ±y/P x ±(-A)P x i.) 

-<w S(P x ±(-A)P x ±)sin 2 Q(X,y) (by Theorems 4.3 and 4.6) 

= s(Px±ap x ±) sin 2 e(x,y). 

Adding both bounds together gives the statement of the theorem, i.e., 

< A ((PxA) \ x ) - A ((P y A) \ y ) < w (S (PxAPx) + S (P X ±AP X ±)) sin 2 Q(X, y). 

Finally, there are dim(X +y) — dim} 7 nonzero components in the vector Q(X, y), since 
dim(X+y)+diia(Xr\y) = dimA'+dimy. The first dim(X+y) -dimj components in 
the vector S (P X AP X ) + S (P X ±AP X ± ) axe the same els those in the vector Spr^^-yj 
since we have redefined A such that the sum X + y gives the whole space and shifted 
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A such that min {A ((P X A) \ x )} = and so we have S(P X AP X ) = A(P X AP X ) 
and S (P X ±AP X ±) = —A^(P X ±AP X ±). For each component of Spr/^ + -y\ where 
AdimA' (x+y) < ^—irx+y) the corresponding angle 9i(X,y) in the vector &(X, y) must 
be zero, so such a component of Spr^ + -y-j with an index larger than dmi(X+y) — dimT- 
can be defined arbitrarily, since it is multiplied by zero. 

4.3.3. Proof of Theorem 2.3. First we use exactly the same simplifications as 
in the beginning of the proof of Theorem 2.1 in subsection 4.3.1, so \ m in(x+y) = 0- 
We assume that the space Ti is already mapped into a space of vectors, so that we 
can use a matrix proof here. Let X and Y be two matrices whose columns form 
orthonormal bases for X and 3^ respectively, so we have A{{P X A)\ X ) = A(X H AX) 
and A((PyA)\y) = A{Y H AY). 

The theorem's assumptions A((P X A)\ X ) > X m in( X +y) and Q(X,y) < tt/2 give 
A((P X A)\ X ) > A((PyA)\y) > \ m int X +y) by (2.6). This is equivalent in our simplified 
situation to A(X H AX) > A(Y H AY) > 0, so we can legitimately take the log of their 
ratio below. By analogy with (4.5), since X is yl-invariant and Py±APy± > because 
of the shift of A that made A > 0, we have 

A(Y H AY) > A(Y H P X AP X Y) = A((Y H X)X H AX(X H Y)) = A{C H X H AXC), 

where we denote C = X H Y . We have A\CC H ) = cos 2 Q(X,y) > by definition 
and the theorem's assumption, so both matrices C and C H are invertible and then 
1 < A^{C- H C- 1 ) = cos-' 2 Q(X,y). The key step is using Theorem 4.2, substituting 
A := C- H and B := C H VX H AX in 



A <^» )-2*( s ^ 

A(C H X H AXC) ) 



AX 

s (c h Vx h ax 

= 2(log S(AB)- log S(B)) 
(4.6) <2 log S(A) 

= logA(C- H C~ 1 ) 

= log (cos- 2 0(X,y))) 

= log(l + tan 2 e(A-,y)) • 

Replacing here A(C H X H AXC) with A{Y H AY) > A(C H X H AXC), as shown above, 
gives the multiplicative weak majorization bound of Theorem 2.3. 

If x -< y then 4>(x) -< w 4>{y) for any nondecreasing convex real valued function </>, 
see, e.g., [17, Statement 4.B.2]. Taking (j>(t) = e* for (4.6) gives 

A{X"AX) A(X H AX) A(X H AX) l + t ^ Q{xy) 

~ A{Y H AY) ~ A(Y H P X AP X Y) ~ A(C H X H AXC) ^ w + u ^^>- 

Subtracting the vector of ones gives the second bound of Theorem 2.3. 
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